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ABSTRACT 

Cosmologists will soon be in a unique position. Observational noise will gradually be 
replaced by cosmic variance as the dominant source of uncertainty in an increasing 
number of observations. We reflect on the ramifications for the discovery and veri- 
fication of new models. If there are features in the full data set that call for a new 
model, there will be no subsequent observations to test that model's predictions. We 
give specific examples of the problem by discussing the pitfalls of model discovery by 
prior adjustment in the context of dark energy models and inflationary theories. We 
show how the gradual release of data can mitigate this difficulty, allowing anomalies to 
be identified, and new models to be proposed and tested. We advocate that observers 
plan for the frugal release of data from future cosmic variance limited observations. 



1 INTRODUCTION 



Cosmologists can only make observations on (or occasionally 
within) our past light cone. Whatever the reality of the mul- 
tiverse, we Earth-bound humans of the Tcmb = 2.725K era 
have access to only a finite volume of space, containing finite 
energy and information. The exciting period in which we find 
ourselves learning more and more about this volume of ac- 
cessible space and its contents cannot last forever. While we 
are unlikely to gather ail the existing information content of 
the observable universe, we are already making substantial 
inroads on the information of cosmic significance. 

The most notable example of confronting the finite in- 
formation content of the universe is our measurements of the 
power in the lowest multipoles Ci of the cosmic microwave 
background (CMB) temperature anisotropies. Their statis- 
tical error bars are now smaller than the "cosmic variance" 
errors - the expected difference between what we measure 
for these multipoles and what we would measure if we could 
average over many independent horizon volumes. The range 
of £ for which this is true is increasing as the Wilkinson 
Microwave Anisotropy Probe (WMAP) continues to report 
new results. This trend will accelerate as new experiments 
join the fray. (Though we could wait a few hundred million 
years to gain access to a mostly-independent last scattering 
surface.) 

The CMB temperature-temperature power spectrum is 
unlikely to be the last place where the finite universe limits 
cosmology. Astronomical surveys are already cataloguing an 
increasing fraction of all the structures within our past light 
cone. Redshifted hydrogen hyperfine instruments will even- 
tually extend the volume over which we map the structure 
of matter nearly out to the horizon. 

There are consequences to becoming a data limited sci- 
ence. We upset the balance between applying the brain's 



remarkable pattern-finding abilities and testing the robust- 
ness of the patterns we discover. We may see patterns in 
finite data, but, unable to collect new data, we have no way 
to confirm their reality, missing out on potentially signifi- 
cant discoveries. We risk falling for what particle physicists 
call "the look elsewhere effect", i.e. the spurious "discov- 
ery" of statistically significant anomalies which are merely 
the consequence of performing a large number of tests on 
the same data. A small fraction of those are bound to re- 
port significant "evidence" for unexpected features due to 
random noise. Unlike experimental scientists, we may no 
longer be able to collect data, form a new hypothesis, and 
test its predictions. Our ability to distinguish between sta- 
tistical fluctuations and real effects becomes limited. 

Given that the challenge of finite data is upon us, our 
best hope is to devise strategies to minimize its effects. The 
approach that we shall explore and advocate is to simulate 
the cycle of data acquisition and analysis by being frugal. By 
allowing colleagues to see only subsets of the data, construct 
hypotheses based on them, then test those hypotheses on 
larger subsets, we can aim to avoid unexplained anomalies 
with untestable explanations. 

The benefits of frugality arise not from some magical 
improvements in the statistical power of the data, but from 
acknowledging and mitigating a basic human failing: over- 
confidence. Specifically, by assigning all probability to the 
set of physical models that we have thought about and con- 
sequently zero probability to all other models, we ignore 
that we may not have considered the correct model. Fru- 
gality allows us to redress those wrongs by admitting such 
models and testing their predictions on our remaining data. 
We examine the effects of (and several strategies for) divid- 
ing cosmological data into several pieces so that new models 
can be consistently explored. 
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2 MODEL DISCOVERY 

2.1 Bayesian model selection and prior updating 

We take a Bayesian outlook on hypothesis testing, as we 
believe (and show below) that this closely reflects the way 
we think about models. Another reason for being wary of 
the usual (frequentist) practice of reporting p- values is that 
the latter are not probabilities for hypo theses, despite be- 
ing commonly misinte rpreted as such (|Sellke et alj l200ll ; 
iGordon fc Trottall2007l 1. Suppose we have a model Mo with 
parameters 9q, that we wish to evaluate in light of data 
d. Our updated state of belief in the model's parameters 
is given by the posterior probability distribution function 
(pdf) on Oq, obtained via Bayes' theorem: 



p(0 a \d,Mo)=p(d\8 o ,M o 



KgoJMo) 
' p(d|Mo) 



(1) 



where p(d\8o, Mo) is the likelihood, p(8o\Mo) the prior on 
the parameters 8o, and p{d\Mo) is the marginal likelihood 
for Mo. Now suppose we notice a feature in the data that 
is not reproduc ed by model Mp (for ex ample by computing 
the doubt, as in lStarkman et all l|2008l )V We invent a model 
Mi with parameters 9\ as an explanation for said feature 
and compute the evidence for both models (i — 0, 1) 



p(d\Mi) 



d8ip(d\8i,Mi)p{8i\Mi) 



(2) 



Each model's posterior probability in light of d is given 
by p(Mt\d) = p(d\Mi)p(Mi)/p(d). The ratio of our de- 
grees of belief in the models, the Bayes factor Bio = 
p(d\Mi)/p(d\Mo), penalizes models that are unnecessarily 
complex, for example because of an excessive number of free 
par ameters , automaticall y encapsulating Occam's razor (see 
e.g. iTrottal i|2007al . 120081) ). In order to increase confidence 
in the new model Mi, all that is required is Bio > 1, i.e. 
that Mi be a more "effective" description of the presently 
available data. There is no dependence on the model's pre- 
dictivity for future observations. 

In practice, a new model probably would not (and ar- 
guably should not) be accepted until it produces a cor- 
rect prediction for future data d' that differs from the old 
model's, thus enabling the models to be distinguished. For- 
mally, the models' relative posterior odds after seeing both 
sets of data are given by 



p(A/i|d,tQ _ p(rf'|Mi)p(rflAfi)p(Mi) 
p(Mo\d, d>) p(d'\M ) p{d\Mo) p(M ) 



(3) 



Before the data set d came along, model Mi was not even 
on the table: p(Mi) = 0. The step of introducing Mi while 
absolutely crucial, formally requires the injection of an infi- 
nite amount of information to raise p(Mi) from to a finite 
value. This prior adjustment is on top of the change in de- 
gree of belief coming from d. It amounts to using the data d 
twice, first to introduce Mi by adjusting its prior and then 
to evaluate the evidence from d. 

The duplicate use of the data d leads to posterior odds 
which can seriously overstate the statistical significance of 
a new effect. We suggest to "forget" about the details of d, 
compress its information into a new non-zero (and still sub- 
jective) prior p(Mi), and then compute the posterior odds 



arising solely from d! , i.e. 
p(Mi|rf,d') p(d / |M 1 )p(MQ 

p(Mo\d,d') ' p{d'\M ) p(M )- ( ' 

If an unlimited amount of data is accessible and the anomaly 
is correctly modelled by Mi, it is guaranteed to become even- 
tually favored by the Bayes factor, independent of the exact 
choices of priors. Using a finite, cosmic-variance-limited data 
set only increases the likelihood that Mi is confirmed before 
the data is exhausted, the more the bigger the fraction of 
unused data in d' . 



2.2 Examples of prior adjustments in cosmology 

Two notable examples in cosmology of devising new mod- 
els and then adjusting their priors are the discovery of dark 
energy and the realization that inflation can easily accom- 
modate f2 < 1. 

The discovery of a non-zero, yet tiny cosmological con- 
stant A was in stark contradiction to prior expectations. 
Particle-physics considerations suggested that A should ei- 
ther be (model Mi) or have a uniform prior between ±Mp 
(model Mi), p{A\Mi) = S(A),p(A\M 2 ) = 0(|A| - M*)/ 
2Mp , where M p is the reduced Planck mass, Q(x) is a step 
function and 8{x) is a Dirac delta distribution. Oversimpli- 
fying history, let us assume these were the only theories at 
hand, and had equal prioriU p(Mi) = p(Mi) = §. 

Along came supe rnova (SN) redshift measurements 
|Perlmutter et al.| [l999l. suggesting a late time acceleration 
of the universe driven by (in the simplest models) a small 

« 10 -120 . To simplify, let us assume that the available 
SN data presented a 5a deviation from A = 0. Comput- 
ing the Bayes factor using the Savage-Dickey density ratio 
jTrottall2007bl ) gives 



Bio — 



p(A = 0\d,M 2 ) 
p(A = 0|X 2 ) 



10 1 
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10 1 



(•>) 



Due to the strong Occam's razor effect of the prior on Mi, 
a vanishing cosmological constant should hafve still been 



vastly preferred, with odds of order 10 



1, over a model 



including a hugely fine-tuned A. A ~ 23a detection of a 
non-zero cosmological constant would have been required to 
override the Occam's razor of the prior. 

However, the particle physics community started re- 
considering priors and developed a new model M3 involv- 
ing anthropic reasoning which gave more weight to small 
values of A, p(A\M s ) = O(10A - A)/10A , with model 



priors now p(Mi) = piMi) = p(A^3) = 



Under the 



new anthropic prior, the effect of Occam's razor is vastly 
reduced, giving a Bayes factor B13 » 10~ 4 , now favoring 
model M3. The parameter value that was a priori consid- 
ered unnatural under the original model for a cosmologi- 
cal constant (small non-zero A) described the data better 
than the prevailing model of A = 0, but not sufficiently well 
to be preferred. Introducing an anthr opic model based on 
the landscape picture in str i ng theory ( Bousso fc Polchinskil 



trie landscape picture m string tncory ( cousso & r olenmsk: 
l200d : iGiddings etail 120021 : iDouglasi 120031 : ISusskindl 120031 : 



1 An interesting suggestion for choosing model's priors based 
on a Maximum Entropy argument has been put forward 
bv lBrewer fc Francis! J2009). 
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IStarkman fc Tr otta 2006) allowed a small, non-zero cosmo- 
logical constant to become the preferred description of the 
data which has since been supported by other observations 
such as CMB and baryon acoustic oscillations. 

It is interesting that ex post facto one might argue that 
perhaps A is restricted to be a positive quantity, in which 
case the ap propriate prior would be uniform in In A rathe r 
than in A l|Evrard fc Colesl 1 19951 ; iKirchner fc Ellisl 120031) . 
Under this mo del M4, and assuming a cu t-off A > A m i n = 
10 -soo M 4 ( see IStarkman fc Trottal (|200d )^ one obtains a 
Bayes factor B14 w 10 -2 , i.e. moderate support for A, anal- 
ogously to what can be obtained by anthropic arguments. 

An earlier example of discovering a new model through 
adjusting priors happened in the mid to late 90s. The over- 
whelming evidence for f2 to t ~ 0.3 < 1 posed a problem 
for inflation, as it had been viewed to generically predict 
a flat universe with 57 ~ 1 to high accuracy - this gener- 
ally accepted model could not describe observations. Dif- 
ferent models (mostly using multiple stages of inflation) 
were devised that produced open universes (|Bucher et al.l 
1995). In other words, after observing that f2 « 0.3, the 
priors for single stage inflation, p(Mo), and for multi stage 
inflation, p(M\), were adjusted from p(Mo) 3> p(Mi) to 
p(Mi) « p(Mo). The prediction for future observations - 
corroborating evidence f or Q ~ 0.3 - was prov en wrong by 
measurements of S7 « 1 (|Netterfield et al" . 2002). The priors 
were reverted back to p(Mo) S> p(Mi), making multi-stage 
models all but obsolete. 

Note that in both the above examples, it was crucial 
that predictions of the new model could be tested by follow- 
up independent observations which either confirmed or re- 
jected the new model. 



3 THE NEED FOR FRUGALITY 

With the launch of the Planck satellite, the power spectrum 
of the temperature fluctuations, Cj T , will be limited by cos- 
mic variance all the way up to £ > 2000. No future observa- 
tion will ever obtain more precise measurements of the CMB 
temperature fluctuations in this I range (barring problems 
with unanticipated systematics) , and higher £— ranges begin 
to be dominated by foreground sources. If there are features 
in the Planck data that can not be adequately explained by 
ACDM (such as a strong correlation between different mul- 
tipoles), we could and should devise a revised concordance 
model. But we would be unable to test its predictions with 

future CMB temperature measurements! 

After the COBE experiment (|Smoot et al.l ll992T l 
observed hints of a low quadrupole, it took subse- 
quent co nfirming measurem e nts by WMAP to estab- 
lish this, (|Spergel et al.l 120031 . 120071 ; iKomatsu et~aH I2009T I 
and to detect the planarity of the quadrupole and oc- 
topole and their alignment with each other, perpendic- 
ular to the ecliptic, with an axi s toward the CMB 
dipole (Ide Oliveira-Costa et all |2004| ; ISchwarz etahl |2004 
lLand fc Magueiid 120051 )7 where cosmic variance already is 
the limiting factor. Thus possible new models explaining the 
low t multipole alignments cannot be tested on their predic- 
tions for future measurements of these multipoles. Instead, 
one has to look for different predictions from the new mod- 
els, e.g. by looking for circles in the sky as a signature of 



a topologically non-trivial universe l|Cornish et alj|2004 ). If 
only parts of the WMAP data had been released, tantalizing 
enough to induce people to look for new models, there would 
have been room to test the predictions of these models for 
the low Is. 

In the (perhaps not so distant) future, a similar situ- 
ation will arise with other cosmological experiments. Large 
scale structure observations by way of galaxy counts will 
eventually measure the positions and redshifts of all galax- 
ies in the our Hubble patch with high precision (neglecting 
uncertainties due to non-linearities). The distribution of hy- 
drogen will be mapped with observations or the Ly-a forest. 
Eventually all observations on cosmological scales will reach 
the cosmic variance limit, as we only have this one universe 
from which to sample. 

In light of this, it seems imperative to reflect on ways 
to extract an optimal amount of information from complete 
finite data sets. They should be not only be used to better 
constrain parameters of the concordance model, but to dis- 
cover and test new models. We need to devise schemes for 
incremental data release as cosmological analogues of blind 
analysis, a procedure often used in particle physics, where 
the need to avoid the (possibly unconscious) influence of the 
statistical methodology adopted on the sig nificance of th e re- 
sults is a well recognized problem, see e.g. iLvonsI (|2008l ). For 
example, one wants to avoid (unwillingly) biassing the signif- 
icance of a signal when designing the "cuts" on the number 
of observed events. Several strategies have been devised to 
this end. For example, a random number can be added to 
the data, and subtracted only after all corrections and other 
data manipulations have been performed; or just a fraction 
of the data is employed to define the statistical procedure, 
while the remainder of the data are only revealed in a subse- 
quent phase. After that point no further adjustments of the 
methodology are allowed. The split of data in subsets can 
either happen in time (an obvious solution for many particle 
physics experiments) or in data space. In the latter case, a 
"signal box" of data is left closed until potential anomalies in 
the first chunk of observations have been identified and sta- 
tistical tests for their confirmation designed, at which point 
the box is opened and the analysis unblinded. An example 
of such a pr ocedure is the miniBooNE neutrino oscillation 
experiment {Bazarko 2001). Another method is sometimes 
adopted by precision measurements where the analysis team 
is allowed to see the full data sets, but with arbitrary units. 
The resulting parameter constraints are rescaled to the ac- 
tual units only at the very end of the analysis. 

All of those strategies are designed with the common 
aim of keeping a part of the information hidden from the 
first stage of the analysis, so as to be able to exploit the full 
statistical power of the hidden data upon unblinding. We 
now turn to the discussion of possible ways of applying this 
idea in the cosmological context. 



4 STRATEGIES FOR THE RELEASE OF 
PARTIAL DATA 

There is always a random element involved in choosing a 
good way to split data, where the definition of "good" of- 
ten depends on the unknown anomalies one is hoping to be 
able to test. Suppose we throw a single coin 2A^ times after 
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which it is lost. The first N throws include an equal number 
of heads and tails, while the last N tosses are all tails. Split- 
ting this data set in these two chunks, the first set points 
towards the model of a fair coin. The second set (all tails) 
raises serious doubt about this model. But we have no way 
of verifying the predictions of a new model (e.g. the coin was 
exchanged for an all-tails coin) as the coin was lost. Had we 
split the data into four equal chunks, then after examining 
the third chunk we would likely have proposed a new model 
of an unfair all-tails coin. The predictions of this new model 
would have been tested (and confirmed) by the fourth chunk 
of data. 

Two opposing forces are at play when considering ways 
to release partial data. On the one hand, releasing individ- 
ual data points will lead to many statistical flukes that can 
be mistaken for features in the data. On the other hand, re- 
leasing all data at once will only allow to determine the pa- 
rameters of the existing models and not to check predictions 
of potential new models. It seems hard to find an optimal 
number of chunks, even more so as it is not even clear how 
data should be split. 

The most natural way to release partial data is often by 
time ordering, such as is employed by many experiments, 
for example WMAP. A natural cut-off be tween data sets 
is th e point in time when (if) the doubt l|Starkman et al.l 
2008) on a concordance or reference model reaches a crit- 
ical threshold, after which an alternative model should be 
devised. Using only data that was not used to compute the 
doubt on the original model, compute the doubt on the new 
model. Iterate this process until all data has been taken or 
funding runs out. This method does not detect all features 
as the likelihood function typically does not incorporate all 
predictions of the original model. For example, the riddles of 
why the two point correlation function of the temperature 
fluctuations vani shes at separation angles larger than 60° 
l|Copi et alll2006f) and of the alignments of t he quadrupole 
and octopole |de Oliveira-Costa et al.l |2004| ; ISchwarz et al.l 
l2004l ; lLand fe Magueiioll2005l ) would escape detection as the 
likelihood function is insensitive to these features. 

Summary statistics for CMB measurements often are 
presented in the form of (binned) Ci's building on isotropy 
and Gaussianity of the a; m 's. Other quantities, such as C(8), 
would work as well. A possible course of action would be to 
exclusively release binned Ce's in the first data release. Then 
a search for deviations from the concordance model - new 
features - could be conducted. If any unexpected features 
are noted in the data, new models would be devised and 
their predictions for the unbinned Ce's could be compared 
against the second, unbinned data release. One might en- 
vision performing a finer graining of the binning process, 
going from e.g. Al = 10 bins in the first year to Al — 5 bins 
in the second year to Al = 1 bins in the third year, or in 
terms of the two-point function C(8) using averaged values 
over AO = 10°, 1°,0.1°, . . . for each release cycle. A possi- 
ble complication is the fact that the successive data releases 
include the previous data and hence are correlated. 

However, there is a way to split data guaranteeing 
uncorr elated data chunks: princ ipal component analysis 
fPCA) (|Huterer fc Starkrnanl l2003'). Each principal compo- 
nent, i.e. eigenvector and eigenvalue of the data's covari- 
ance matrix, is released separately, giving as many attempts 
at finding new models as there are well-constrained PCAs. 



Their order seems to be a matter of taste. Releasing the best- 
constrained component first would make it easiest to detect 
any features, then using the less-well constrained modes to 
verify any new model. Not producing any hints at a new 
model, this procedure - as any splitting of data - would 
not have any negative impact on parameter estimation (as 
Bayesian updating of posterior pdfs does not care about the 
order of the information being added) . 

Independent of how the data is split, sizing the indi- 
vidual chunks also seems to be rather an art. They should 
neither be too small, i.e. not so noisy as to induce spurious 
features, nor too large, or new models will not be testable. 
It may prove beneficial to release data chunks with the same 
information content, as measured e.g. by the mean square 
error or an information-theory based measure such as the 
Kullback-Leibler divergence. 



5 CONCLUSIONS 

Cosmologists are in a paradoxical situation. They strive to 
acquire data of the highest possible quality to constrain pa- 
rameters of their models as quickly as possible. But they 
should be open to new features in the data that are not 
predicted by current models, and hence to the possibility 
of having to devise new models and test their predictions. 
We have argued that for the latter step, availability of fresh 
data is crucial, which for cosmic variance limited data sets 
is simply not possible. We therefore propose that such ulti- 
mate data sets be treated as the precious resources they are 
and released slowly and carefully. 

We have discussed various strategies for parsing such 
data sets. It remains an art to find the optimal way to split 
data and release it, involving inevitably a certain degree of 
luck to detect unexpected features. It seems to us from this 
first overview that the most promising way of "dividing the 
plunder" is to employ a PCA decomposition of the data 
and release data parts of equal information content. This is 
a compromise between being able to find new features and 
having enough data left to reliably test possible new models. 
However, the best strategy is likely to depend heavily on 
the particular data set, and on the taste of the individual 
investigators. Wishing to avoid that basic human failing of 
over-confidence we acknowledge that there is a reasonable 
chance that we have overlooked the optimal strategy. 

We urge our observational colleagues to be frugal with 
their data. Slicing the data and doling it out slowly is in all 
of our long term best interests. 
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